Skip to content

Download: make it retry more times and log the first error from many#1603

Merged
yarikoptic merged 5 commits intomasterfrom
enh-download
Apr 3, 2025
Merged

Download: make it retry more times and log the first error from many#1603
yarikoptic merged 5 commits intomasterfrom
enh-download

Conversation

@yarikoptic
Copy link
Copy Markdown
Member

See individual commits for more information.

Some users do experience odd timeouts from S3. Most likely due to some
local effects, but I think we should strive more to ensure we try our
best to download. So we will keep retrying more times.
I think we "over-optimized" minimization of messages in some prior PR
and it is hard or impossible to find actual error or even for which file/zarr.
So I decided to add an explicit error log line to ease debugging
@yarikoptic yarikoptic added patch Increment the patch version when merged cmd-download labels Apr 3, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the download functionality by logging the first encountered error for easier debugging and increasing the number of retry attempts to enhance robustness.

  • Logs the first error message before raising a RuntimeError.
  • Increases retry attempts from 3 to 10 in the download function.
Comments suppressed due to low confidence (2)

dandi/download.py:217

  • [nitpick] Consider whether logging only the first error provides sufficient context for debugging; if additional context could be useful, logging more or all errors might be helpful.
error_msg = f"Encountered {pluralize(len(errors), 'error')} while downloading."

dandi/download.py:747

  • [nitpick] Consider extracting the magic number '10' into a well-named constant (e.g., MAX_DOWNLOAD_ATTEMPTS) to improve code clarity and maintainability.
10  # number to do, could be incremented if we downloaded a little

@yarikoptic yarikoptic added the release Create a release when this pr is merged label Apr 3, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.64%. Comparing base (b1815da) to head (09d3371).
Report is 110 commits behind head on master.

Files with missing lines Patch % Lines
dandi/download.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1603      +/-   ##
==========================================
+ Coverage   88.54%   88.64%   +0.10%     
==========================================
  Files          78       78              
  Lines       10872    10872              
==========================================
+ Hits         9627     9638      +11     
+ Misses       1245     1234      -11     
Flag Coverage Δ
unittests 88.64% <90.90%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@candleindark candleindark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let comment not related to the changes you put in but about some existing code for downloading.

@candleindark
Copy link
Copy Markdown
Member

I updated the test. This PR is good to go. I still see the following connection failure in one of the tests.

connection failure
================================== FAILURES ===================================
____________________________ test_follow_redirect _____________________________
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:787: in urlopen
    response = self._make_request(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:488: in _make_request
    raise new_e
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:464: in _make_request
    self._validate_conn(conn)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:1093: in _validate_conn
    conn.connect()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:741: in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:920: in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:460: in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:504: in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:513: in wrap_socket
    return self.sslsocket_class._create(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1071: in _create
    self.do_handshake()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1342: in do_handshake
    self._sslobj.do_handshake()
E   ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\adapters.py:667: in send
    resp = conn.urlopen(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:841: in urlopen
    retries = retries.increment(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\retry.py:474: in increment
    raise reraise(type(error), error, _stacktrace)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\util.py:38: in reraise
    raise value.with_traceback(tb)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:787: in urlopen
    response = self._make_request(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:488: in _make_request
    raise new_e
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:464: in _make_request
    self._validate_conn(conn)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:1093: in _validate_conn
    conn.connect()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:741: in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:920: in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:460: in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:504: in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:513: in wrap_socket
    return self.sslsocket_class._create(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1071: in _create
    self.do_handshake()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1342: in do_handshake
    self._sslobj.do_handshake()
E   urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:
dandi\tests\test_dandiarchive.py:461: in test_follow_redirect
    url = follow_redirect("https://bit.ly/dandi12")
dandi\dandiarchive.py:894: in follow_redirect
    r = requests.head(url, allow_redirects=True)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\api.py:100: in head
    return request("head", url, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\api.py:59: in request
    return session.request(method=method, url=url, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:589: in request
    resp = self.send(prep, **send_kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:724: in send
    history = [resp for resp in gen]
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:724: in <listcomp>
    history = [resp for resp in gen]
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:265: in resolve_redirects
    resp = self.send(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:703: in send
    r = adapter.send(request, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\adapters.py:682: in send
    raise ConnectionError(err, request=request)
E   requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError([1005](https://github.com/dandi/dandi-cli/actions/runs/14249292407/job/39937930698?pr=1603#step:8:1006)4, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
=============================== tests coverage ================================
______________ coverage: platform win32, python 3.10.11-final-0 _______________

Coverage XML written to file coverage.xml
============================ slowest 10 durations =============================
75.12s call     dandi/tests/test_dandiarchive.py::test_follow_redirect
71.61s setup    dandi/cli/tests/test_service_scripts.py::test_reextract_metadata
19.13s call     dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
8.48s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-dry]
3.90s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-symlink]
3.90s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-hardlink]
3.90s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-copy]
3.89s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-auto]
3.88s call     dandi/tests/test_organize.py::test_organize_nwb_test_data[1-simulate]
3.27s setup    dandi/cli/tests/test_cmd_validate.py::test_validate_nwb_path_grouping
=========================== short test summary info ===========================
FAILED dandi/tests/test_dandiarchive.py::test_follow_redirect - requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
= 1 failed, 451 passed, 282 skipped, 1 xfailed, 9 xpassed in 284.33s (0:04:44) =

The failure is coming from

dandi\tests\test_dandiarchive.py:461: in test_follow_redirect
    url = follow_redirect("https://bit.ly/dandi12")
dandi\dandiarchive.py:894: in follow_redirect
    r = requests.head(url, allow_redirects=True)

and it is unrelated to this PR.

@yarikoptic
Copy link
Copy Markdown
Member Author

thanks for fixing up tests @candleindark

ok, we might be done with it, unless someone would want to add a unittest to actually add coverage for failing to download attempts testing -- apparently that is not covered at all :-/ but I have no spare cycles atm, so may be later. final blessing @candleindark ?

@yarikoptic
Copy link
Copy Markdown
Member Author

windows fail - filed

@yarikoptic yarikoptic merged commit d6eb2c1 into master Apr 3, 2025
28 of 30 checks passed
@yarikoptic yarikoptic deleted the enh-download branch April 3, 2025 20:17
@yarikoptic
Copy link
Copy Markdown
Member Author

oh it was already approved -- merging ;)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2025

🚀 PR was released in 0.67.3 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmd-download patch Increment the patch version when merged release Create a release when this pr is merged released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants